Day 17 - Regular expressions - Groups

Solutions to exercises

Exercise 17.01

Extract all the lines of simple.log that contain an HTTP method GET or POST, rewrite each line in

the form <time> <HTTP status> <HTTP method>. The result for the first 10 lines should be

10:05:03 200 GET

10:05:43 200 GET

10:05:47 200 GET

10:05:12 200 GET

10:05:07 200 GET

10:05:34 200 GET

10:05:57 200 GET

10:05:50 200 GET

10:05:24 200 GET

10:05:50 200 GET

Solution

$ head simple.log | grep -E " (GET|POST) " | sed -r s,".*[0-9]{4}:(.*)] (GET|POST).*\

HTTP/1.[01] ([0-9]{3}).*","\1 \3 \2",

10:05:03 200 GET

10:05:43 200 GET

10:05:47 200 GET

10:05:12 200 GET

10:05:07 200 GET

10:05:34 200 GET

10:05:57 200 GET

10:05:50 200 GET

10:05:24 200 GET

10:05:50 200 GET

The idea behind this solution it to find all the lines that contain GET or POST using the logical OR

in a group, so that either can be surrounded by spaces, which helps avoiding other mentions of

those letters (like for example a line with an URL that contains “BUDGET” or “POSTER”). Then